This is the second assignment of Benedek Pásztor for Data Science 3 at CEU.

At the beginning the neccessary packages are loaded.

# install.packages("keras")
library(keras)
# install_keras()
library(keras)
library(here)
## here() starts at /home/rstudio/keras_20190404
library(grid)
library(magick)  # not absolutely necessary
## Linking to ImageMagick 6.9.7.4
## Enabled features: fontconfig, freetype, fftw, lcms, pango, x11
## Disabled features: cairo, ghostscript, rsvg, webp

Task 1. - Deeplearning on Fashion MNIST data

The dataset is loaded of fashion images. Train and test sets are defined.

fashion_mnist <- dataset_fashion_mnist()
x_train <- fashion_mnist$train$x
y_train <- fashion_mnist$train$y
x_test <- fashion_mnist$test$x
y_test <- fashion_mnist$test$y

a. Show some example images from the data.

An example image is shown. There are 10 categories / classes in this dataset as indicated in the source: https://github.com/zalandoresearch/fashion-mnist

This example shows a T-shirt labeled image.

show_mnist_image <- function(x) {
  image(1:28, 1:28, t(x)[,nrow(x):1],col=gray((0:255)/255)) 
}

show_mnist_image(x_train[18, , ])

This other one as well.

show_mnist_image(x_train[27, , ])

This one is a jeans.

show_mnist_image(x_train[98, , ])

b. Train a fully connected deep network to predict items.

- Normalize the data similarly to what we saw with MNIST.

At the beginning of the exercise the dataset is normalized so that KERAS would be able to deal with it.

Afterwards, the categorical variables are one-hot encoded, so that they can be treated as binary variables.

x_train <- array_reshape(x_train, c(dim(x_train)[1], 784)) 
x_test <- array_reshape(x_test, c(dim(x_test)[1], 784)) 

x_train <- x_train / 255
x_test <- x_test / 255

y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)
- Experiment with network architectures and settings (number of hidden layers, number of nodes, activation functions, dropout, etc.)
Model 1 of non-CNN models

Now, a first model is tested which is a fully connected neural network with three layers. The first layer has 128 units and has a relu activation. Then, a droupout of 0.2 is used for the nodes. This is followed by another relu-activated layer with 50 nodes. The final activation leading to the 10 output nodes is a softmax one.

model <- keras_model_sequential() 
model <- model %>% 
  layer_dense(units = 128, activation = 'relu', input_shape = c(784)) %>%
  layer_dropout(rate = 0.2) %>%
  layer_dense(units = 50, activation = 'relu') %>%
  layer_dense(units = 10, activation = 'softmax')

summary(model)
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## dense_1 (Dense)                  (None, 128)                   100480      
## ___________________________________________________________________________
## dropout_1 (Dropout)              (None, 128)                   0           
## ___________________________________________________________________________
## dense_2 (Dense)                  (None, 50)                    6450        
## ___________________________________________________________________________
## dense_3 (Dense)                  (None, 10)                    510         
## ===========================================================================
## Total params: 107,440
## Trainable params: 107,440
## Non-trainable params: 0
## ___________________________________________________________________________

Categorical crossentropy is used as a loss function. The metrics is accuracy which is tested.

model %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = optimizer_rmsprop(),
  metrics = c('accuracy')
)

The training is done with 35 epochs with a batch size of 128. A validation frame is also used with 0.2 split ratio. The 35 epoch size has been chosen as after testing, it was seen that the accuracy metrics does not change further after this. In fact, after a couple of epochs the validation accuracy is rather stable.

history <- model %>% fit(
  x_train, y_train, 
  epochs = 20, batch_size = 128, 
  validation_split = 0.2
)

With this model an accuracy of around 0.88 is reached.

model %>% evaluate(x_test, y_test)
## $loss
## [1] 0.3690644
## 
## $acc
## [1] 0.8802

The training history can be seen below:

plot(history)

Model 2 of non-CNN models

The second model operates with a deeper network without any dropout rate.

7 layers are used, with the first layer 128 nodes, then 50-100 in a switching manner until the latest layer which has 10 as an output. Relu activation is used, except for the last layer.

model <- keras_model_sequential() 
model <- model %>% 
  layer_dense(units = 128, activation = 'relu', input_shape = c(784)) %>%
  layer_dense(units = 50, activation = 'relu') %>%
  layer_dense(units = 100, activation = 'relu') %>%
  layer_dense(units = 50, activation = 'relu') %>%
  layer_dense(units = 100, activation = 'relu') %>%
    layer_dense(units = 50, activation = 'relu') %>%
  layer_dense(units = 10, activation = 'softmax')

summary(model)
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## dense_4 (Dense)                  (None, 128)                   100480      
## ___________________________________________________________________________
## dense_5 (Dense)                  (None, 50)                    6450        
## ___________________________________________________________________________
## dense_6 (Dense)                  (None, 100)                   5100        
## ___________________________________________________________________________
## dense_7 (Dense)                  (None, 50)                    5050        
## ___________________________________________________________________________
## dense_8 (Dense)                  (None, 100)                   5100        
## ___________________________________________________________________________
## dense_9 (Dense)                  (None, 50)                    5050        
## ___________________________________________________________________________
## dense_10 (Dense)                 (None, 10)                    510         
## ===========================================================================
## Total params: 127,740
## Trainable params: 127,740
## Non-trainable params: 0
## ___________________________________________________________________________

Categorical crossentropy is used as a loss function. The metrics is accuracy which is tested.

model %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = optimizer_rmsprop(),
  metrics = c('accuracy')
)

The training is done with 35 epochs with a batch size of 128. A validation frame is also used with 0.2 split ratio. The 35 epoch size has been chosen as after testing, it was seen that the accuracy metrics does not change further after this. In fact, after a couple of epochs the validation accuracy is rather stable.

history <- model %>% fit(
  x_train, y_train, 
  epochs = 20, batch_size = 128, 
  validation_split = 0.2
)

With this model an accuracy of around 0.87 is reached, a bit worse than model 1.

model %>% evaluate(x_test, y_test)
## $loss
## [1] 0.3912158
## 
## $acc
## [1] 0.8847

The training history of model 2 can be seen below:

plot(history)

Model 3 of non-CNN models

In the third model a very similar model is used as in the first one, however, this time the activation is via the sigmoid function.

model <- keras_model_sequential() 
model <- model %>% 
  layer_dense(units = 128, activation = 'sigmoid', input_shape = c(784)) %>%
  layer_dropout(rate = 0.2) %>%
  layer_dense(units = 50, activation = 'sigmoid') %>%
  layer_dense(units = 10, activation = 'softmax')

summary(model)
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## dense_11 (Dense)                 (None, 128)                   100480      
## ___________________________________________________________________________
## dropout_2 (Dropout)              (None, 128)                   0           
## ___________________________________________________________________________
## dense_12 (Dense)                 (None, 50)                    6450        
## ___________________________________________________________________________
## dense_13 (Dense)                 (None, 10)                    510         
## ===========================================================================
## Total params: 107,440
## Trainable params: 107,440
## Non-trainable params: 0
## ___________________________________________________________________________

Categorical crossentropy is used as a loss function. The metrics is accuracy which is tested.

model %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = optimizer_rmsprop(),
  metrics = c('accuracy')
)

The training is done with 35 epochs with a batch size of 128. A validation frame is also used with 0.2 split ratio. The 35 epoch size has been chosen as after testing, it was seen that the accuracy metrics does not change further after this. In fact, after a couple of epochs the validation accuracy is rather stable.

history <- model %>% fit(
  x_train, y_train, 
  epochs = 20, batch_size = 128, 
  validation_split = 0.2
)

With this model an accuracy of around 0.885 is reached.

model %>% evaluate(x_test, y_test)
## $loss
## [1] 0.3404752
## 
## $acc
## [1] 0.8789

The training history of model 3 can be seen below:

plot(history)

Model 4 of non-CNN models

As a forth model, a different batch size is used with the currently best performing model, model 1. The batch size is increased to 200.

model <- keras_model_sequential() 
model <- model %>% 
  layer_dense(units = 128, activation = 'relu', input_shape = c(784)) %>%
  layer_dropout(rate = 0.2) %>%
  layer_dense(units = 50, activation = 'relu') %>%
  layer_dense(units = 10, activation = 'softmax')

summary(model)
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## dense_14 (Dense)                 (None, 128)                   100480      
## ___________________________________________________________________________
## dropout_3 (Dropout)              (None, 128)                   0           
## ___________________________________________________________________________
## dense_15 (Dense)                 (None, 50)                    6450        
## ___________________________________________________________________________
## dense_16 (Dense)                 (None, 10)                    510         
## ===========================================================================
## Total params: 107,440
## Trainable params: 107,440
## Non-trainable params: 0
## ___________________________________________________________________________
model %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = optimizer_rmsprop(),
  metrics = c('accuracy')
)
history <- model %>% fit(
  x_train, y_train, 
  epochs = 20, batch_size = 200, 
  validation_split = 0.2
)

With this model an accuracy of around 0.88 is reached.

model %>% evaluate(x_test, y_test)
## $loss
## [1] 0.3497267
## 
## $acc
## [1] 0.8829

The training history can be seen below:

plot(history)

- Explain what you have tried, what worked and what did not. Present a final model.

We could see that the training accuracy has been increasing more constantly than tha validation accuracy. The highest accuracy 0.8911 has been reached with an increased batch-sized first model, model 4.

The final model hence contains relu-activated functions and dropout. First, a 128-nodes relu-activated layer, then a dropout rate of 0.2, then another relu-activated function with 50 nodes. Finally, an ouput layer with softmax activation and 10 nodes.

- Make sure that you use enough epochs so that the validation error starts flattening out - provide a plot about the training history (plot(history))

Training history plots are provided below each model.

- Evaluate the model on the test set. How does test error compare to validation error?

The test errors have been constantly evaluated up until now to compare the models. We could see that the validation accuracy is slightly higher in most cases compared to the test error. Nonetheless, not both can be considered as honest measurement indicators, it is better to use the test error, as it refers to the data completely left out from the training.

c. Try building a convolutional neural network and see if you can improve test set performance. and d. Just like before, experiment with different network architectures, regularization techniques and present your findings

Model 1 of CNN models

To build CNN in Keras, the input data should be different to the previous one. Below the data is transformed for CNN needs.

fashion_mnist <- dataset_fashion_mnist()
x_train <- fashion_mnist$train$x
y_train <- fashion_mnist$train$y
x_test <- fashion_mnist$test$x
y_test <- fashion_mnist$test$y

x_train <- array_reshape(x_train, c(nrow(x_train), 28, 28, 1))
x_test <- array_reshape(x_test, c(nrow(x_test), 28, 28, 1))

x_train <- x_train / 255
x_test <- x_test / 255

y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)

As an initial model, a 6-layered CNN is built. First, a filtering with relu activation and a cernel size of 3-3. Then, a max pooling with a pool size of 2-2. Afterwards a droupout rate of 0.25 is used. This is followed by flattening and two last layers with 32 node size and then finally a softmax-activated output layer with 10 nodes.

cnn_model <- keras_model_sequential() 
cnn_model %>% 
  layer_conv_2d(filters = 32,
                kernel_size = c(3, 3), 
                activation = 'relu',
                input_shape = c(28, 28, 1)) %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>% 
  layer_dropout(rate = 0.25) %>%
  layer_flatten() %>% 
  layer_dense(units = 32, activation = 'relu') %>% 
  layer_dense(units = 10, activation = 'softmax')
summary(cnn_model)
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## conv2d_1 (Conv2D)                (None, 26, 26, 32)            320         
## ___________________________________________________________________________
## max_pooling2d_1 (MaxPooling2D)   (None, 13, 13, 32)            0           
## ___________________________________________________________________________
## dropout_4 (Dropout)              (None, 13, 13, 32)            0           
## ___________________________________________________________________________
## flatten_1 (Flatten)              (None, 5408)                  0           
## ___________________________________________________________________________
## dense_17 (Dense)                 (None, 32)                    173088      
## ___________________________________________________________________________
## dense_18 (Dense)                 (None, 10)                    330         
## ===========================================================================
## Total params: 173,738
## Trainable params: 173,738
## Non-trainable params: 0
## ___________________________________________________________________________

Categorical crossentropy is used as the loss function.

cnn_model %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = optimizer_rmsprop(),
  metrics = c('accuracy')
)
history <- cnn_model %>% fit(
  x_train, y_train, 
  epochs = 20, batch_size = 128, 
  validation_split = 0.2
)

In this model, a validation accuracz of 0.91 is reached. Moreover, the test accuracy is very similar, around 0.91 again.

cnn_model %>% evaluate(x_test, y_test)
## $loss
## [1] 0.2651655
## 
## $acc
## [1] 0.9137

The training history can be seen below:

plot(history)

Model 2 of CNN models

Now another type of network is used with an increased batch size. The filter kernel size is now 3, as well as the max pooling layer’s pool size. The dropout is kept at 0.25, and the last layers of relu activation and the softmax output layer are kept.

cnn_model <- keras_model_sequential() 
cnn_model %>% 
  layer_conv_2d(filters = 32,
                kernel_size = c(3 , 3), 
                activation = 'relu',
                input_shape = c(28, 28, 1)) %>%
  layer_max_pooling_2d(pool_size = c(3, 3)) %>% 
  layer_dropout(rate = 0.25) %>%
  layer_flatten() %>% 
  layer_dense(units = 32, activation = 'relu') %>% 
  layer_dense(units = 10, activation = 'softmax')
summary(cnn_model)
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## conv2d_2 (Conv2D)                (None, 26, 26, 32)            320         
## ___________________________________________________________________________
## max_pooling2d_2 (MaxPooling2D)   (None, 8, 8, 32)              0           
## ___________________________________________________________________________
## dropout_5 (Dropout)              (None, 8, 8, 32)              0           
## ___________________________________________________________________________
## flatten_2 (Flatten)              (None, 2048)                  0           
## ___________________________________________________________________________
## dense_19 (Dense)                 (None, 32)                    65568       
## ___________________________________________________________________________
## dense_20 (Dense)                 (None, 10)                    330         
## ===========================================================================
## Total params: 66,218
## Trainable params: 66,218
## Non-trainable params: 0
## ___________________________________________________________________________
cnn_model %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = optimizer_rmsprop(),
  metrics = c('accuracy')
)
history <- cnn_model %>% fit(
  x_train, y_train, 
  epochs = 20, batch_size = 200, 
  validation_split = 0.2
)

In this model, a validation accuracy of 0.9 is reached. This one resulted in a lower performance than model 1.

cnn_model %>% evaluate(x_test, y_test)
## $loss
## [1] 0.2756967
## 
## $acc
## [1] 0.9017

The training history can be seen below:

plot(history)

Model 3 of CNN models

Model 3 deals with a similar architecture as model 1, however, without the dropout. This is an experiment to see the effect of the dropout layer.

cnn_model <- keras_model_sequential() 
cnn_model %>% 
  layer_conv_2d(filters = 32,
                kernel_size = c(3, 3), 
                activation = 'relu',
                input_shape = c(28, 28, 1)) %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>% 
  layer_flatten() %>% 
  layer_dense(units = 32, activation = 'relu') %>% 
  layer_dense(units = 10, activation = 'softmax')
summary(cnn_model)
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## conv2d_3 (Conv2D)                (None, 26, 26, 32)            320         
## ___________________________________________________________________________
## max_pooling2d_3 (MaxPooling2D)   (None, 13, 13, 32)            0           
## ___________________________________________________________________________
## flatten_3 (Flatten)              (None, 5408)                  0           
## ___________________________________________________________________________
## dense_21 (Dense)                 (None, 32)                    173088      
## ___________________________________________________________________________
## dense_22 (Dense)                 (None, 10)                    330         
## ===========================================================================
## Total params: 173,738
## Trainable params: 173,738
## Non-trainable params: 0
## ___________________________________________________________________________

Categorical crossentropy is used as the loss function.

cnn_model %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = optimizer_rmsprop(),
  metrics = c('accuracy')
)
history <- cnn_model %>% fit(
  x_train, y_train, 
  epochs = 20, batch_size = 128, 
  validation_split = 0.2
)

With this model 3 a lower accuracy, slightly lower has been reached for the test-set. This indicates that the dropout is useful to increase the test accuracy, although it might decrease the training validation accuracy as such.

cnn_model %>% evaluate(x_test, y_test)
## $loss
## [1] 0.3104327
## 
## $acc
## [1] 0.9039

The training history can be seen below:

plot(history)

Model 4 of CNN models

Model 4 deals with a deep neural net to see the effect of how more layers change the performance. The filtering and the max pooling is several times used, then a dropout of 0.25 is used, followed by flattening, a relu-activated layer and finally a softmax-activated layer for categorization.

cnn_model <- keras_model_sequential() 
cnn_model %>% 
  layer_conv_2d(filters = 32,
                kernel_size = c(3, 3), 
                activation = 'relu',
                input_shape = c(28, 28, 1)) %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>% 
   layer_conv_2d(filters = 32,
                kernel_size = c(3, 3), 
                activation = 'relu') %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>% 
  layer_conv_2d(filters = 32,
                kernel_size = c(3, 3), 
                activation = 'relu') %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>% 
  layer_dropout(rate = 0.25) %>%
  layer_flatten() %>% 
  layer_dense(units = 32, activation = 'relu') %>% 
  layer_dense(units = 10, activation = 'softmax')

The summary of the layers and the parameters can be seen below.

summary(cnn_model)
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## conv2d_4 (Conv2D)                (None, 26, 26, 32)            320         
## ___________________________________________________________________________
## max_pooling2d_4 (MaxPooling2D)   (None, 13, 13, 32)            0           
## ___________________________________________________________________________
## conv2d_5 (Conv2D)                (None, 11, 11, 32)            9248        
## ___________________________________________________________________________
## max_pooling2d_5 (MaxPooling2D)   (None, 5, 5, 32)              0           
## ___________________________________________________________________________
## conv2d_6 (Conv2D)                (None, 3, 3, 32)              9248        
## ___________________________________________________________________________
## max_pooling2d_6 (MaxPooling2D)   (None, 1, 1, 32)              0           
## ___________________________________________________________________________
## dropout_6 (Dropout)              (None, 1, 1, 32)              0           
## ___________________________________________________________________________
## flatten_4 (Flatten)              (None, 32)                    0           
## ___________________________________________________________________________
## dense_23 (Dense)                 (None, 32)                    1056        
## ___________________________________________________________________________
## dense_24 (Dense)                 (None, 10)                    330         
## ===========================================================================
## Total params: 20,202
## Trainable params: 20,202
## Non-trainable params: 0
## ___________________________________________________________________________

Categorical crossentropy is used as the loss function.

cnn_model %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = optimizer_rmsprop(),
  metrics = c('accuracy')
)
history <- cnn_model %>% fit(
  x_train, y_train, 
  epochs = 20, batch_size = 128, 
  validation_split = 0.2
)

With this Model 4 of CNN an accuracy of 0.87 has been reached for the validation set, and a 0.86 for the testing set. This is not an as good performing model as the one with only one layer of convolution.

cnn_model %>% evaluate(x_test, y_test)
## $loss
## [1] 0.3662034
## 
## $acc
## [1] 0.867

The training history can be seen below:

plot(history)

Conclusion of task 1.

As a conclusion of this exercise, convolutional neural networks have been performing better. The best-achieving Model 1 of CNN was able to reach around 91% of test accuracy. It is interesting to note that the same model with deeper network (model 4) was resulting in lower performance measures.

The best model (model 1) first filters with relu activation and a cernel size of 3-3. Then, a max pooling with a pool size of 2-2. Afterwards a droupout rate of 0.25 is used. This is followed by flattening and two last layers with 32 node size and then finally a softmax-activated output layer with 10 nodes.

The best-performing non CNN model was model 4 out of the non-CNN networks, with an increased batch-size to 200. Nonetheless, the performance of this was lower than of the CNN ones.

Task 2. - Deeplearning on Hot dog or not hot dog? data

a. Pre-process data so that it is acceptable by Keras (set folder structure, bring images to the same size, etc).

Before starting the exercise, let us get familiarized with the images. Here is a hot-dog-labeled picture presented.

library(keras)
library(here)
library(grid)
library(magick)

example_image_path <- file.path(here(), "/data/hot-dog-not-hot-dog/train/hot_dog/1000288.jpg")

image_read(example_image_path)

Then, a not hot dog type of random picture is also checked.

example_image_path <- file.path(here(), "/data/hot-dog-not-hot-dog/train/not_hot_dog/100945.jpg")

image_read(example_image_path)

The data is separated originally to a training and to a test set. Nonetheless, to support training, it has been decided by the analyst to select random 80 pictures out of the 250 pictures of test sets to use as validation.

train_datagen <- image_data_generator(rescale = 1/255)  

validation_datagen <- image_data_generator(rescale = 1/255)  

test_datagen <- image_data_generator(rescale = 1/255)  
image_size <- c(150, 150)
batch_size <- 30

train_generator <- flow_images_from_directory(
  file.path(here(), "data/hot-dog-not-hot-dog/train/"), 
  train_datagen,          
  target_size = image_size,  
  batch_size = batch_size,
  class_mode = "binary"       
)

validation_generator <- flow_images_from_directory(
  file.path(here(), "data/hot-dog-not-hot-dog/validation/"),   
  validation_datagen,
  target_size = image_size,
  batch_size = batch_size,
  class_mode = "binary"
)

test_generator <- flow_images_from_directory(
  file.path(here(), "data/hot-dog-not-hot-dog/test/"), 
  test_datagen,
  target_size = image_size,
  batch_size = batch_size,
  class_mode = "binary"
)

b. Estimate a convolutional neural network to predict if an image contains a hot dog or not. Evaluate your model on the test set.

The first CNN which is estimated on the set is deep, it contains 10 layers. Three times filtering with 3x3 kernel size, then max pooling, then filtering again, then again max pooling, then filtering again, then again max pooling. Finally, before a final relu activated 8 noded layer and a final sigmoid one-noded layer, there is a dropout of 0.25 and a flattening layer utilized.

Binary crossentropy is being used as the loss-function and accuracy is being used as a performance indicator.

hot_dog_model <- keras_model_sequential() 
hot_dog_model %>% 
  layer_conv_2d(filters = 32,
                kernel_size = c(3, 3), 
                activation = 'relu',
                input_shape = c(150, 150, 3)) %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>% 
  layer_conv_2d(filters = 16,
                kernel_size = c(3, 3), 
                activation = 'relu') %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>% 
  layer_conv_2d(filters = 16,
                kernel_size = c(3, 3), 
                activation = 'relu') %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>% 
  layer_dropout(rate = 0.25) %>% 
  layer_flatten() %>% 
  layer_dense(units = 8, activation = 'relu') %>% 
  layer_dense(units = 1, activation = "sigmoid")  

hot_dog_model %>% compile(
  loss = "binary_crossentropy",
  optimizer = optimizer_rmsprop(lr = 2e-5),
  metrics = c("accuracy")
)
history <- hot_dog_model %>% fit_generator(
  train_generator,
  steps_per_epoch = 2000 / batch_size,
  epochs = 20,
  validation_data = validation_generator,
  validation_steps = 50
)

The model reaches up to over 0.57 of accuracy for the validation set. Nonetheless, as seen below, for the test set it remains slightly below 0.55 of accuracy.

hot_dog_model %>% evaluate_generator(test_generator, steps = 20)
## $loss
## [1] 0.6853814
## 
## $acc
## [1] 0.5362069

c. Could data augmentation techniques help with achieving higher predictive accuracy? Try some augmentations that you think make sense and compare

Two models are tested with augmentation.

Model 1 with augmentation

Data augmentation techniques usually helps largely accuracy, as it gives the possibility to not to check the exact photos themselves only, but to try rescaling, rotate, shifting width/height range, shear images, zoom in, or to flip the images, which would then be considered as well during training.

Below an example can be seen for a not hot-dog picture how it can get rotated after data augmentation.

img <- image_load(example_image_path, target_size = c(150, 150))
x <- image_to_array(img) / 255
grid::grid.raster(x)
xx <- flow_images_from_data(
  array_reshape(x * 255, c(1, dim(x))), 
  generator = train_datagen
)

train_datagen = image_data_generator(
  rescale = 1/255,
  rotation_range = 40,
  width_shift_range = 0.2,
  height_shift_range = 0.2,
  shear_range = 0.2,
  zoom_range = 0.2,
  horizontal_flip = TRUE,
  fill_mode = "nearest"
)

augmented_versions <- lapply(1:10, function(ix) generator_next(xx) %>%  {.[1, , , ]})

grid::grid.raster(augmented_versions[[3]])

image_read(augmented_versions[[9]])

The model itself is similar to the one used previously.

hot_dog_model <- keras_model_sequential() 
hot_dog_model %>% 
  layer_conv_2d(filters = 32,
                kernel_size = c(3, 3), 
                activation = 'relu',
                input_shape = c(150, 150, 3)) %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>% 
  layer_conv_2d(filters = 16,
                kernel_size = c(3, 3), 
                activation = 'relu') %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>% 
  layer_conv_2d(filters = 16,
                kernel_size = c(3, 3), 
                activation = 'relu') %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>% 
  layer_dropout(rate = 0.25) %>% 
  layer_flatten() %>% 
  layer_dense(units = 8, activation = 'relu') %>% 
  layer_dense(units = 1, activation = "sigmoid")  

hot_dog_model %>% compile(
  loss = "binary_crossentropy",
  optimizer = optimizer_rmsprop(lr = 2e-5),
  metrics = c("accuracy")
)
history <- hot_dog_model %>% fit_generator(
  train_generator,
  steps_per_epoch = 2000 / batch_size,
  epochs = 20,
  validation_data = validation_generator,
  validation_steps = 50
)

This model has not achieved a significantly better result. Both on the validation set and on the test set, it has around 0.52 accuracy.

hot_dog_model %>% evaluate_generator(test_generator, steps = 20)
## $loss
## [1] 0.6765995
## 
## $acc
## [1] 0.5844828

The training history can be seen below:

plot(history)

Model 2 with augmentation

The second model with augmentation allows for higher ranges. This time it is the highest priority to reach higher performance increase before dealing with transfer networks.

img <- image_load(example_image_path, target_size = c(150, 150))
x <- image_to_array(img) / 255
grid::grid.raster(x)
xx <- flow_images_from_data(
  array_reshape(x * 255, c(1, dim(x))), 
  generator = train_datagen
)

train_datagen = image_data_generator(
  rescale = 1/255,
  rotation_range = 40,
  width_shift_range = 0.3,
  height_shift_range = 0.3,
  shear_range = 0.4,
  zoom_range = 0.4,
  horizontal_flip = TRUE,
  fill_mode = "nearest"
)

augmented_versions <- lapply(1:10, function(ix) generator_next(xx) %>%  {.[1, , , ]})

grid::grid.raster(augmented_versions[[3]])

image_read(augmented_versions[[9]])

The model itself is similar to the one used previously. Nonetheless, this time an even deeper network is used in the hope to reach higher performance.

hot_dog_model <- keras_model_sequential() 
hot_dog_model %>% 
  layer_conv_2d(filters = 32,
                kernel_size = c(3, 3), 
                activation = 'relu',
                input_shape = c(150, 150, 3)) %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>% 
  layer_conv_2d(filters = 16,
                kernel_size = c(3, 3), 
                activation = 'relu') %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>% 
  layer_conv_2d(filters = 16,
                kernel_size = c(3, 3), 
                activation = 'relu') %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>% 
  layer_conv_2d(filters = 16,
                kernel_size = c(3, 3), 
                activation = 'relu') %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>% 
  layer_conv_2d(filters = 16,
                kernel_size = c(3, 3), 
                activation = 'relu') %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>% 
  layer_dropout(rate = 0.25) %>% 
  layer_flatten() %>% 
  layer_dense(units = 8, activation = 'relu') %>% 
  layer_dense(units = 1, activation = "sigmoid")  

hot_dog_model %>% compile(
  loss = "binary_crossentropy",
  optimizer = optimizer_rmsprop(lr = 2e-5),
  metrics = c("accuracy")
)
history <- hot_dog_model %>% fit_generator(
  train_generator,
  steps_per_epoch = 2000 / batch_size,
  epochs = 20,
  validation_data = validation_generator,
  validation_steps = 50
)

It is interesting to note that with even this second augmentation model, the validation accuracy did not increase significantly. Nonetheless, the testing accuracy slightly increased up to almost 0.53.

hot_dog_model %>% evaluate_generator(test_generator, steps = 20)
## $loss
## [1] 0.6934816
## 
## $acc
## [1] 0.4965517

The training history can be seen below:

plot(history)

d. Try to rely on some pre-built neural networks to aid prediction. Can you achieve a better performance using transfer learning for this problem?

In the last part of the task 2 transfer learning is used in the hope to reach higher performance. The imagenet weights are used from the MobileNet architecture which is used for transfer learning.

model_imagenet <- application_mobilenet(weights = "imagenet")

Below we can see an example of what scores this pre-built neural net has for its categories. The goal is to build on this already given network to then have a higher-performing model for the hot dog problem.

example_image_path <- file.path(here(), "/data/hot-dog-not-hot-dog/train/not_hot_dog/106608.jpg")
img <- image_load(example_image_path, target_size = c(224, 224))  

x <- image_to_array(img)

x <- array_reshape(x, c(1, dim(x)))
x <- mobilenet_preprocess_input(x)

preds <- model_imagenet %>% predict(x)
mobilenet_decode_predictions(preds, top = 3)[[1]]
##   class_name class_description      score
## 1  n02909870            bucket 0.30173144
## 2  n02747177            ashcan 0.23374596
## 3  n03590841   jack-o'-lantern 0.09609534

The necessary sets are defined and data augmentation is also used for achieving high performance.

train_datagen = image_data_generator(
  rescale = 1/255,
  rotation_range = 40,
  width_shift_range = 0.2,
  height_shift_range = 0.2,
  shear_range = 0.2,
  zoom_range = 0.2,
  horizontal_flip = TRUE,
  fill_mode = "nearest"
)

validation_datagen <- image_data_generator(rescale = 1/255)  

test_datagen <- image_data_generator(rescale = 1/255)  

image_size <- c(128, 128)
batch_size <- 100 

train_generator <- flow_images_from_directory(
  file.path(here(), "data/hot-dog-not-hot-dog/train/"), 
  train_datagen,          
  target_size = image_size,  
  batch_size = batch_size,
  class_mode = "binary"    
)

validation_generator <- flow_images_from_directory(
  file.path(here(), "data/hot-dog-not-hot-dog/validation/"),   
  validation_datagen,
  target_size = image_size,
  batch_size = batch_size,
  class_mode = "binary"
)

test_generator <- flow_images_from_directory(
  file.path(here(), "data/hot-dog-not-hot-dog/test/"), 
  test_datagen,
  target_size = image_size,
  batch_size = batch_size,
  class_mode = "binary"
)
base_model <- application_mobilenet(weights = 'imagenet', include_top = FALSE,
                                    input_shape = c(image_size, 3))

freeze_weights(base_model)

predictions <- base_model$output %>% 
  layer_global_average_pooling_2d() %>% 
  layer_dense(units = 16, activation = 'relu') %>% 
  layer_dense(units = 1, activation = 'sigmoid')

model <- keras_model(inputs = base_model$input, outputs = predictions)

model %>% compile(
  loss = "binary_crossentropy",
  optimizer = optimizer_rmsprop(lr = 2e-5),
  metrics = c("accuracy")
)

model %>% fit_generator(
  train_generator,
  steps_per_epoch = 2000 / batch_size,
  epochs = 5, 
  validation_data = validation_generator,
  validation_steps = 50
)
model %>% evaluate_generator(test_generator, steps = 20)
## $loss
## [1] 0.5133703
## 
## $acc
## [1] 0.7176471

With this model based on transfer learning a validation accuracy of 0.72 was reached. Nonetheless, on the test set around 0.77 was reached which is of the highest.

Conclusion of task 2

It could be seen that very low performing models were achieved with simple CNN, or even augmented CNN models. Although data augmentation seems to be a powerful way to fight against overfitting, the simple models built on this image-sets have been not being able to capture the categories.

Once transfer-learning was used, a significantly higher performing model was achieved with over 0.7 of accuracy. It is important to note as a lesson that the transfer learning model was very deep. Hence, it seems that for this kind of recognitions, very deep networks are needed.